-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolve and document most common erasure coded pool pain points #3194
Conversation
SUCCESS: make check on fc53dc9 output is http://paste.ubuntu.com/9551698/ |
Documentation part review by Italo Santos okdokk@gmail.com |
SUCCESS: make check on b47b333 output is http://paste.ubuntu.com/9553221/ |
SUCCESS: make check on 4c05213 output is http://paste.ubuntu.com/9554907/ |
rebased and repushed |
FAIL: the output of run-make-check.sh on 4df7a46 is http://paste.pound-python.org/show/EwJFIOR3jzfcKYk3ArDW/ |
SUCCESS: the output of run-make-check.sh on c3edf67 is http://paste.pound-python.org/show/RcxenPns9Pdz28O1g1iB/ |
running in gitbuilder |
http://tracker.ceph.com/issues/10349 Fixes: #10349 Signed-off-by: Loic Dachary <ldachary@redhat.com>
It is common for people to try to map 9 OSDs out of a 9 OSDs total ceph cluster. The default tries (50) will frequently lead to bad mappings for this use case. Changing it to 100 makes no significant CPU performance difference, as tested manually by running crushtool on one million mappings. http://tracker.ceph.com/issues/10353 Fixes: #10353 Signed-off-by: Loic Dachary <ldachary@redhat.com>
The ruleset created for an erasure coded pool has max_size set to a fixed value of 20, which may be incorrect when more than 20 chunks are needed and lead to obscure errors. Set it to the number of chunks, i.e. k+m most of the time. In a cluster with few OSDs (9 for instance), setting max_size to 20 causes performance problems when injecting a new crushmap. The monitor will call CrushTester::test which tries 1024 mappins for all sizes ranging from min_size to max_size. Each attempt to map more OSDs than available will exhaust all retries (50 by default) and it takes a significant amount of time. In a cluster with 9 OSDs, testing one such ruleset can take up to 5 seconds. Since the test blocks the monitor leader, a few erasure coded rulesets will block the monitor long enough to exceed the timeouts and trigger an election. http://tracker.ceph.com/issues/10363 Fixes: #10363 Signed-off-by: Loic Dachary <ldachary@redhat.com>
Add a new section to the PG troubleshooting section that covers the most common problems reported when an erasure coded pool fails to properly map PGs to enough OSDs. http://tracker.ceph.com/issues/10350 Fixes: #10350 Signed-off-by: Loic Dachary <ldachary@redhat.com>
Use different erasure coded pool names and profiles to avoid deletion / creation races. The more expensive alternative is to run a different cluster for each test. Signed-off-by: Loic Dachary <ldachary@redhat.com>
SUCCESS: the output of run-make-check.sh on centos-centos7 for ac051fe is http://paste2.org/dFMbjVBs |
…ries resolve and document most common erasure coded pool pain points Documentation-Reviewed-by: Italo Santos <okdokk@gmail.com>
No description provided.